JHU/APL Experiments at CLEF: Translation Resources and Score Normalization
نویسندگان
چکیده
The Johns Hopkins University Applied Physics Laboratory participated in three of the five tasks of the CLEF-2001 evaluation, monolingual retrieval, bilingual retrieval, and multilingual retrieval. In this paper we describe the fundamental methods we used and we present initial results from three experiments. The first investigation examines whether residual inverse document frequency can improve the term weighting methods used with a linguisticallymotivated probabilistic model. The second experiment attempts to assess the benefit of various translation resources for cross-language retrieval. Our last effort is to improve cross-collection score normalization, a task essential for the multilingual problem.
منابع مشابه
Cross-Language Retrieval Using HAIRCUT for CLEF 2004
JHU/APL continued to explore the use of knowledge-light methods for scalable multilingual retrieval during the CLEF 2004 evaluation. We relied on the language-neutral techniques of character n-gram tokenization, pre-translation query expansion, statistical translation using aligned parallel corpora, fusion from disparate retrievals, and reliance on language similarity when resources are scarce....
متن کاملExploring New Languages with HAIRCUT at CLEF 2005
JHU/APL has long espoused the use of language-neutral methods for cross-language information retrieval. This year we participated in the ad hoc cross-language track and submitted both monolingual and bilingual runs. We undertook our first investigations in the Bulgarian and Hungarian languages. In our bilingual experiments we used several nontraditional CLEF query languages such as Greek, Hunga...
متن کاملJHU Experiments in Monolingual Farsi Document Retrieval at CLEF 2009
At CLEF 2009 JHU submitted runs in the ad hoc track for the monolingual Persian evaluation. Variants of character n-gram tokenization provided a 10% relative gain over unnormalized words. A run based on skip n-grams, which allow internal skipped letters, achieved a mean average precision of 0.4938. Using traditional 5-grams resulted in a score of 0.4868 while plain words had a score of 0.4463.
متن کاملJHU/APL Experiments in Tokenization and Non-Word Translation
In the past we have conducted experiments that investigate the benefits and peculiarities attendant to alternative methods for tokenization, particularly overlapping character n-grams. This year we continued this line of work and report new findings reaffirming that the judicious use of n-grams can lead to performance surpassing that of word-based tokenization. In particular we examined: the re...
متن کاملJHU Ad Hoc Experiments at CLEF 2008
For CLEF 2008 JHU conducted monolingual and bilingual experiments in the ad hoc TEL and Persian tasks. The TEL task involved focused on searching electronic card catalog records in English, French, and German using data from the British Library, the Bibliotheque Nationale de France, and the Österreichische Nationalbibliothek (Austrian National Library). The approach we adopted for TEL was to st...
متن کامل